55 research outputs found

    Network-based analysis of gene expression data

    Get PDF
    The methods of molecular biology for the quantitative measurement of gene expression have undergone a rapid development in the past two decades. High-throughput assays with the microarray and RNA-seq technology now enable whole-genome studies in which several thousands of genes can be measured at a time. However, this has also imposed serious challenges on data storage and analysis, which are subject of the young, but rapidly developing field of computational biology. To explain observations made on such a large scale requires suitable and accordingly scaled models of gene regulation. Detailed models, as available for single genes, need to be extended and assembled in larger networks of regulatory interactions between genes and gene products. Incorporation of such networks into methods for data analysis is crucial to identify molecular mechanisms that are drivers of the observed expression. As methods for this purpose emerge in parallel to each other and without knowing the standard of truth, results need to be critically checked in a competitive setup and in the context of the available rich literature corpus. This work is centered on and contributes to the following subjects, each of which represents important and distinct research topics in the field of computational biology: (i) construction of realistic gene regulatory network models; (ii) detection of subnetworks that are significantly altered in the data under investigation; and (iii) systematic biological interpretation of detected subnetworks. For the construction of regulatory networks, I review existing methods with a focus on curation and inference approaches. I first describe how literature curation can be used to construct a regulatory network for a specific process, using the well-studied diauxic shift in yeast as an example. In particular, I address the question how a detailed understanding, as available for the regulation of single genes, can be scaled-up to the level of larger systems. I subsequently inspect methods for large-scale network inference showing that they are significantly skewed towards master regulators. A recalibration strategy is introduced and applied, yielding an improved genome-wide regulatory network for yeast. To detect significantly altered subnetworks, I introduce GGEA as a method for network-based enrichment analysis. The key idea is to score regulatory interactions within functional gene sets for consistency with the observed expression. Compared to other recently published methods, GGEA yields results that consistently and coherently align expression changes with known regulation types and that are thus easier to explain. I also suggest and discuss several significant enhancements to the original method that are improving its applicability, outcome and runtime. For the systematic detection and interpretation of subnetworks, I have developed the EnrichmentBrowser software package. It implements several state-of-the-art methods besides GGEA, and allows to combine and explore results across methods. As part of the Bioconductor repository, the package provides a unified access to the different methods and, thus, greatly simplifies the usage for biologists. Extensions to this framework, that support automating of biological interpretation routines, are also presented. In conclusion, this work contributes substantially to the research field of network-based analysis of gene expression data with respect to regulatory network construction, subnetwork detection, and their biological interpretation. This also includes recent developments as well as areas of ongoing research, which are discussed in the context of current and future questions arising from the new generation of genomic data

    Network-based analysis of gene expression data

    Get PDF
    The methods of molecular biology for the quantitative measurement of gene expression have undergone a rapid development in the past two decades. High-throughput assays with the microarray and RNA-seq technology now enable whole-genome studies in which several thousands of genes can be measured at a time. However, this has also imposed serious challenges on data storage and analysis, which are subject of the young, but rapidly developing field of computational biology. To explain observations made on such a large scale requires suitable and accordingly scaled models of gene regulation. Detailed models, as available for single genes, need to be extended and assembled in larger networks of regulatory interactions between genes and gene products. Incorporation of such networks into methods for data analysis is crucial to identify molecular mechanisms that are drivers of the observed expression. As methods for this purpose emerge in parallel to each other and without knowing the standard of truth, results need to be critically checked in a competitive setup and in the context of the available rich literature corpus. This work is centered on and contributes to the following subjects, each of which represents important and distinct research topics in the field of computational biology: (i) construction of realistic gene regulatory network models; (ii) detection of subnetworks that are significantly altered in the data under investigation; and (iii) systematic biological interpretation of detected subnetworks. For the construction of regulatory networks, I review existing methods with a focus on curation and inference approaches. I first describe how literature curation can be used to construct a regulatory network for a specific process, using the well-studied diauxic shift in yeast as an example. In particular, I address the question how a detailed understanding, as available for the regulation of single genes, can be scaled-up to the level of larger systems. I subsequently inspect methods for large-scale network inference showing that they are significantly skewed towards master regulators. A recalibration strategy is introduced and applied, yielding an improved genome-wide regulatory network for yeast. To detect significantly altered subnetworks, I introduce GGEA as a method for network-based enrichment analysis. The key idea is to score regulatory interactions within functional gene sets for consistency with the observed expression. Compared to other recently published methods, GGEA yields results that consistently and coherently align expression changes with known regulation types and that are thus easier to explain. I also suggest and discuss several significant enhancements to the original method that are improving its applicability, outcome and runtime. For the systematic detection and interpretation of subnetworks, I have developed the EnrichmentBrowser software package. It implements several state-of-the-art methods besides GGEA, and allows to combine and explore results across methods. As part of the Bioconductor repository, the package provides a unified access to the different methods and, thus, greatly simplifies the usage for biologists. Extensions to this framework, that support automating of biological interpretation routines, are also presented. In conclusion, this work contributes substantially to the research field of network-based analysis of gene expression data with respect to regulatory network construction, subnetwork detection, and their biological interpretation. This also includes recent developments as well as areas of ongoing research, which are discussed in the context of current and future questions arising from the new generation of genomic data

    Bioconductor's EnrichmentBrowser: seamless navigation through combined results of set- & network-based enrichment analysis

    Get PDF
    Background: Enrichment analysis of gene expression data is essential to find functional groups of genes whose interplay can explain experimental observations. Numerous methods have been published that either ignore (set-based) or incorporate (network-based) known interactions between genes. However, the often subtle benefits and disadvantages of the individual methods are confusing for most biological end users and there is currently no convenient way to combine methods for an enhanced result interpretation. Results: We present the EnrichmentBrowser package as an easily applicable software that enables (1) the application of the most frequently used set-based and network-based enrichment methods, (2) their straightforward combination, and (3) a detailed and interactive visualization and exploration of the results. The package is available from the Bioconductor repository and implements additional support for standardized expression data preprocessing, differential expression analysis, and definition of suitable input gene sets and networks. Conclusion: The EnrichmentBrowser package implements essential functionality for the enrichment analysis of gene expression data. It combines the advantages of set-based and network-based enrichment analysis in order to derive high-confidence gene sets and biological pathways that are differentially regulated in the expression data under investigation. Besides, the package facilitates the visualization and exploration of such sets and pathways

    Orchestrating a community-developed computational workshop and accompanying training materials [version 1; referees: 2 approved]

    Get PDF
    The importance of bioinformatics, computational biology, and data science in biomedical research continues to grow, driving a need for effective instruction and education. A workshop setting, with lectures and guided hands-on tutorials, is a common approach to teaching practical computational and analytical methods. Here, we detail the process we used to produce high-quality, community-authored educational materials that are available for public consumption and reuse. The coordinated efforts of 17 authors over 10 weeks resulted in 15 workshops available as a website and as a 388-page electronic book. We describe how we utilized cloud infrastructure, GitHub, and a literate programming approach to robustly deliver hands-on tutorials to participants of the annual Bioconductor conference. The scripts, raw and published workshop materials, and cloud machine image are all openly available. Our approach uses free services and software and can be adapted by workshop organizers and authors in other contests with appropriate technical backgrounds

    Consensus on Molecular Subtypes of High-grade Serous Ovarian Carcinoma

    Full text link
    Purpose: The majority of ovarian carcinomas are of high-grade serous histology, which is associated with poor prognosis. Surgery and chemotherapy are the mainstay of treatment, and molecular characterization is necessary to lead the way to targeted therapeutic options. To this end, various computational methods for gene expression-based subtyping of high-grade serous ovarian carcinoma (HGSOC) have been proposed, but their overlap and robustness remain unknown. Experimental Design: We assess three major subtype classifiers by meta-analysis of publicly available expression data, and assess statistical criteria of subtype robustness and classifier concordance. We develop a consensus classifier that represents the subtype classifications of tumors based on the consensus of multiple methods, and outputs a confidence score. Using our compendium of expression data, we examine the possibility that a subset of tumors are unclassifiable based on currently proposed subtypes. Results: HGSOC subtyping classifiers exhibit moderate pairwise concordance across our data compendium (58.9%-70.9%, p \u3c 10-5) and are associated with overall survival in a metaanalysis across datasets (p \u3c 10-5). Current subtypes do not meet statistical criteria for robustness to re-clustering across multiple datasets (Prediction Strength \u3c 0.6). A new subtype classifier is trained on concordantly classified samples to yield a consensus classification of patient tumors that correlates with patient age, survival, tumor purity, and lymphocyte infiltration. Conclusion: A new consensus ovarian subtype classifier represents the consensus of methods, and demonstrates the importance of classification approaches for cancer that do not require all tumors to be assigned to a distinct subtype

    Factors influencing the efficiency of generating genetically engineered pigs by nuclear transfer: multi-factorial analysis of a large data set

    Get PDF
    Background: Somatic cell nuclear transfer (SCNT) using genetically engineered donor cells is currently the most widely used strategy to generate tailored pig models for biomedical research. Although this approach facilitates a similar spectrum of genetic modifications as in rodent models, the outcome in terms of live cloned piglets is quite variable. In this study, we aimed at a comprehensive analysis of environmental and experimental factors that are substantially influencing the efficiency of generating genetically engineered pigs. Based on a considerably large data set from 274 SCNT experiments (in total 18,649 reconstructed embryos transferred into 193 recipients), performed over a period of three years, we assessed the relative contribution of season, type of genetic modification, donor cell source, number of cloning rounds, and pre-selection of cloned embryos for early development to the cloning efficiency. Results: 109 (56%) recipients became pregnant and 85 (78%) of them gave birth to offspring. Out of 318 cloned piglets, 243 (76%) were alive, but only 97 (40%) were clinically healthy and showed normal development. The proportion of stillborn piglets was 24% (75/318), and another 31% (100/318) of the cloned piglets died soon after birth. The overall cloning efficiency, defined as the number of offspring born per SCNT embryos transferred, including only recipients that delivered, was 3.95%. SCNT experiments performed during winter using fetal fibroblasts or kidney cells after additive gene transfer resulted in the highest number of live and healthy offspring, while two or more rounds of cloning and nuclear transfer experiments performed during summer decreased the number of healthy offspring. Conclusion: Although the effects of individual factors may be different between various laboratories, our results and analysis strategy will help to identify and optimize the factors, which are most critical to cloning success in programs aiming at the generation of genetically engineered pig models

    Strengthening The Organization and Reporting of Microbiome Studies (STORMS): A Reporting Checklist for Human Microbiome Research

    Get PDF
    Background Human microbiome research is a growing field with the potential for improving our understanding and treatment of diseases and other conditions. The field is interdisciplinary, making concise organization and reporting of results across different styles of epidemiology, biology, bioinformatics, translational medicine, and statistics a challenge. Commonly used reporting guidelines for observational or genetic epidemiology studies lack key features specific to microbiome studies. Methods A multidisciplinary group of microbiome epidemiology researchers reviewed elements of available reporting guidelines for observational and genetic studies and adapted these for application to culture-independent human microbiome studies. New reporting elements were developed for laboratory, bioinformatic, and statistical analyses tailored to microbiome studies, and other parts of these checklists were streamlined to keep reporting manageable. Results STORMS is a 17-item checklist for reporting on human microbiome studies, organized into six sections covering typical sections of a scientific publication, presented as a table with space for author-provided details and intended for inclusion in supplementary materials. Conclusions STORMS provides guidance for authors and standardization for interdisciplinary microbiome studies, facilitating complete and concise reporting and augments information extraction for downstream applications. Availability The STORMS checklist is available as a versioned spreadsheet from https://www.stormsmicrobiome.org/

    Genetics Meets Metabolomics: A Genome-Wide Association Study of Metabolite Profiles in Human Serum

    Get PDF
    The rapidly evolving field of metabolomics aims at a comprehensive measurement of ideally all endogenous metabolites in a cell or body fluid. It thereby provides a functional readout of the physiological state of the human body. Genetic variants that associate with changes in the homeostasis of key lipids, carbohydrates, or amino acids are not only expected to display much larger effect sizes due to their direct involvement in metabolite conversion modification, but should also provide access to the biochemical context of such variations, in particular when enzyme coding genes are concerned. To test this hypothesis, we conducted what is, to the best of our knowledge, the first GWA study with metabolomics based on the quantitative measurement of 363 metabolites in serum of 284 male participants of the KORA study. We found associations of frequent single nucleotide polymorphisms (SNPs) with considerable differences in the metabolic homeostasis of the human body, explaining up to 12% of the observed variance. Using ratios of certain metabolite concentrations as a proxy for enzymatic activity, up to 28% of the variance can be explained (p-values 10−16 to 10−21). We identified four genetic variants in genes coding for enzymes (FADS1, LIPC, SCAD, MCAD) where the corresponding metabolic phenotype (metabotype) clearly matches the biochemical pathways in which these enzymes are active. Our results suggest that common genetic polymorphisms induce major differentiations in the metabolic make-up of the human population. This may lead to a novel approach to personalized health care based on a combination of genotyping and metabolic characterization. These genetically determined metabotypes may subscribe the risk for a certain medical phenotype, the response to a given drug treatment, or the reaction to a nutritional intervention or environmental challenge
    • …
    corecore